Spark History Server
Spark History Server is the web UI for completed and running (aka incomplete) Spark applications. It is an extension of Spark’s web UI.
|
Tip
|
Enable collecting events in your Spark applications using spark.eventLog.enabled Spark property. |
You can start History Server by executing start-history-server.sh shell script and stop it using stop-history-server.sh.
start-history-server.sh accepts --properties-file [propertiesFile] command-line option that specifies the properties file with the custom Spark properties.
$ ./sbin/start-history-server.sh --properties-file history.properties
If not specified explicitly, Spark History Server uses the default configuration file, i.e. spark-defaults.conf.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
Starting History Server — start-history-server.sh script
You can start a HistoryServer instance by executing $SPARK_HOME/sbin/start-history-server.sh script (where SPARK_HOME is the directory of your Spark installation).
$ ./sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to .../spark/logs/spark-jacek-org.apache.spark.deploy.history.HistoryServer-1-japila.out
Internally, start-history-server.sh script starts org.apache.spark.deploy.history.HistoryServer standalone application for execution (using spark-daemon.sh shell script).
$ ./bin/spark-class org.apache.spark.deploy.history.HistoryServer
|
Tip
|
Using the more explicit approach with spark-class to start Spark History Server could be easier to trace execution by seeing the logs printed out to the standard output and hence terminal directly.
|
When started, it prints out the following INFO message to the logs:
INFO HistoryServer: Started daemon with process name: [processName]
It registers signal handlers (using SignalUtils) for TERM, HUP, INT to log their execution:
ERROR HistoryServer: RECEIVED SIGNAL [signal]
It inits security if enabled (using spark.history.kerberos.enabled setting).
|
Caution
|
FIXME Describe initSecurity
|
It creates a SecurityManager.
It creates a ApplicationHistoryProvider (by reading spark.history.provider).
It creates a HistoryServer and requests it to bind to spark.history.ui.port port.
|
Tip
|
The host’s IP can be specified using |
You should see the following INFO message in the logs:
INFO HistoryServer: Bound HistoryServer to [host], and started at [webUrl]
It registers a shutdown hook to call stop on the HistoryServer instance.
|
Tip
|
Use stop-history-server.sh shell script to to stop a running History Server. |
Stopping History Server — stop-history-server.sh script
You can stop a running instance of HistoryServer using $SPARK_HOME/sbin/stop-history-server.sh shell script.
$ ./sbin/stop-history-server.sh
stopping org.apache.spark.deploy.history.HistoryServer
Settings
| Setting | Default Value | Description |
|---|---|---|
|
The port of the History Server’s UI. |
|
|
The directory with the event logs. The directory has to exist before starting History Server. |
|
|
|
How many Spark applications to retain. |
|
(unbounded) |
how many Spark applications to show in the UI. |
|
|
Enable security when working with HDFS with security enabled (Kerberos). |
|
(empty) |
Kerberos principal. Required when |
|
(empty) |
Keytab to use for login to Kerberos. Required when |
The fully-qualified class name for a ApplicationHistoryProvider. |